Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.
You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Example 1:
Input: word1 = "horse", word2 = "ros" Output: 3 Explanation: horse -> rorse (replace 'h' with 'r') rorse -> rose (remove 'r') rose -> ros (remove 'e')
Example 2:
Input: word1 = "intention", word2 = "execution" Output: 5 Explanation: intention -> inention (remove 't') inention -> enention (replace 'i' with 'e') enention -> exention (replace 'n' with 'x') exention -> exection (replace 'n' with 'c') exection -> execution (insert 'u')
Constraints:
0 <= word1.length, word2.length <= 500word1andword2consist of lowercase English letters.
Average Rating: 3.82 (98 votes)
Solution
Intuition
The edit distance algorithm is very popular among the data scientists. It's one of the basic algorithms used for evaluation of machine translation and speech recognition.
The naive approach would be to check for all possible edit
sequences and choose the shortest one in-between.
That would result in an exponential complexity and it's an overkill
since we actually don't need to have all possible edit sequences
but just the shortest one.
Approach 1: Dynamic Programming
The idea would be to reduce the problem to simple ones.
For example, there are two words, horse and ros and we want to compute
an edit distance D for them. One could notice that it seems to be
more simple for short words and so it would be logical to relate
an edit distance D[n][m] with the lengths n and m of input words.
Let's go further and introduce an edit distance D[i][j] which is
an edit distance between the first i characters of word1 and
the first j characters of word2.
It turns out that one could compute D[i][j], knowing
D[i - 1][j], D[i][j - 1] and D[i - 1][j - 1].
There is just one more character to add into one or both strings and the formula is quite obvious.
If the last character is the same, i.e. word1[i] = word2[j] then
D[i][j]=1+min(D[i−1][j],D[i][j−1],D[i−1][j−1]−1)
and if not, i.e. word1[i] != word2[j] we have to
take into account the replacement of the last character
during the conversion.
D[i][j]=1+min(D[i−1][j],D[i][j−1],D[i−1][j−1])
So each step of the computation would be done based on the previous computation, as follows:
The obvious base case is an edit distance between the empty string and
non-empty string that means D[i][0] = i and D[0][j] = j.
Now we have everything to actually proceed to the computations
Complexity Analysis
- Time complexity : O(mn) as it follows quite straightforward for the inserted loops.
- Space complexity : O(mn) since at each step we keep the results of all previous computations.
Last Edit: January 11, 2019 6:08 AM
Thanks to the guy who contributes to this answer, but when word[i] == word[j], it's simply dp[i][j] = dp[i-1][j-1]. The way the answer is written might look more consistent in two cases but it's more confusing.
Last Edit: March 9, 2019 8:29 AM
Reaaaaally ? You guys don't even mention in the solution that it's a good old Levenshtein distance ?
Please if you use some known algorithm, at least mention it!
Last Edit: December 8, 2018 12:52 AM
Plain English Description w/ JavaScript Solution:
Effectively, what the solution above is describing is the creation of a matrix/table that has inputs for all preceding inputs. One crucial step here is that the "base case" starts off with the empty String.
IE: When comparing "" vs "kitten", we have 6 changes, because we need to insert every character of kitten. So, 1 for "" vs "k", 2 for "" vs "ki", etc. We do the same thing by comparing the empty string to "sitting".
Let's start by iterating over our Strings, i for word1, and j for word2, and comparing each string to ' '. Our initial table should look like.
? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1
i | 2
t | 3
t | 4
i | 5
n | 6
g | 7
Great! We have the foundation to build our solution. So comparing "k" to "s", we first want to check what the minimum solution of the previous strings were (' ' vs ' ', ' ' vs 'k', ' ' vs 's' ). We see that the minimum prior solution is in the diagonal of our table. So, we retrieve this value (table[i - 1][j - 1] which is currently 0) and use it. But since k does not equal s, we add 1, IE, our first step in comparing k vs s should look like:
? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 |
i | 2
t | 3
t | 4
i | 5
n | 6
g | 7
We can follow this same logic moving forward, comparing the previous solutions (left, diagonal, top with respect to table cells) and adding 1. I've filled in the table up to where we encounter "ki" vs "si". This is where we have to apply a slight modification.
? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 | 2 | 3 | 4 | 5 | 6
i | 2 | 1 |
t | 3
t | 4
i | 5
n | 6
g | 7
Because the ith and jth character match one another, we simply use the diagonal value as there are no new changes required. IE: "ki" vs "si" has the same solution as "k" vs "s" since the "i" is the same.
? | ' ' | k | i | t | t | e | n
' ' | 0 | 1 | 2 | 3 | 4 | 5 | 6
s | 1 | 1 | 2 | 3 | 4 | 5 | 6
i | 2 | 1 | 1
t | 3
t | 4
i | 5
n | 6
g | 7
We can continue with this logic filling in our table, and at the end our answer will be in the last diagonal cell! Here is the code:
/**
* @param {string} word1
* @param {string} word2
* @return {number}
*/
var minDistance = function(word1, word2) {
const dist = [[]];
// we've added plus one to account for the empty string
for (let i = 0; i <= word1.length; i++) {
if (!dist[i]) { // because initializing empty Arrays with current sizes in JavaScript is weird
dist[i] = [];
}
for (let j = 0; j <= word2.length; j++) {
if (i === 0) {
// i is the empty string, so 1 insert per char of j
dist[i][j] = j;
} else if (j === 0) {
// j is the empty string, so 1 insert per char of i
dist[i][j] = i;
} else if (word1[i - 1] === word2[j - 1]) {
// the last character matches, so use the diagonal
// this is because the no changes required from previous
dist[i][j] = dist[i - 1][j - 1];
} else {
dist[i][j] = Math.min(1 + Math.min(
dist[i - 1][j],
dist[i][j - 1],
dist[i - 1][j - 1]
));
}
}
}
return dist[word1.length][word2.length];
};
Hope this helps someone!
January 10, 2019 6:41 AM
Please do not provide spaghetti code like this:
if (n * m == 0)
return n + m;
just write two if statements, which are much better.
Best explanation of the algorithm https://medium.com/@ethannam/understanding-the-levenshtein-distance-equation-for-beginners-c4285a5604f0
March 26, 2019 10:37 AM
For all the people who want to have a better understanding of this problem I will refer to Algorithm Design Manual book by Steven Skiena page 282 or section 8.2.2.
For all those who are stuck can refer this video https://youtu.be/WgmZ-5qAHJ8 . Found this really useful and learnt how to approach such DP Problems.
November 24, 2018 10:16 AM
Hi, @andvary , I'm very curious about how the short video in this solution is made.
Because I have been trying to write solutions, I always wanted to make such a video to describe my ideas.
Can you tell me what the name of this video is, and it would be better if you could recommend a software for making this video.
Thanks a lot.
April 21, 2020 3:40 AM
You get -10000 points for the poor naming of variables. Here, a much nicer version of the algorithm:
https://www.***.org/edit-distance-dp-5/
Last Edit: December 8, 2018 10:46 PM
O(n)-space Python:
n = len(word1)
dp = [i for i in range(n + 1)]
for i in range(1, len(word2) + 1):
p, dp[0] = dp[0], i
for j in range(1, n + 1):
t = dp[j]
dp[j] = p if word1[j - 1] == word2[i - 1] else min(p, dp[j], dp[j - 1]) + 1
p = t
return dp[n]
xxxxxxxxxxclass Solution {public: int minDistance(string word1, string word2) { int m = word1.size(); int n = word2.size(); int dp[m+1][n+1]; for(int i=0;i<m+1;i++){ for(int j=0;j<n+1;j++){ if(i==0) dp[i][j] = j; if(j==0) dp[i][j] = i; } } for(int i=1;i<m+1;i++) { for(int j=1;j<n+1;j++) { if(word1[i-1]==word2[j-1]) dp[i][j] = dp[i-1][j-1]; else dp[i][j] = min(dp[i-1][j-1], min(dp[i-1][j], dp[i][j-1]))+1; } } return dp[m][n]; }};
